[paper]A Few Useful Things to Know About Machine Learning
Machine Learning has lots of folk wisdom, which acquired from doing it. This paper is trying to give explain some of that wisdom.
Learning = Representation + Evaluation + Optimization
All ML algorithms contain 3 components and 3 components only,
-
Representation Representation of the algorithm in a formal computer language. This is equally important as choosing the set of classifiers(hypothesis space) it can possibly learn.
-
Evaluation Evaluation function is required to find the good classification from bad ones.
-
Optimization Search the classifiers to find the best scoring one.
It's Generalization that Counts
Generalization is important because it's impossible to have all the possibilities at the training time. It's required to divide the data in a meaningful way to do the train and test. Else not possible to achieve generalization.
Data alone is not enough
Even though ML is data driven technique, a domain knowledge is also required. Mainly because it's not possible have enough data to represent all the outcomes.
Overfitting has many faces
Overfitting is failure to generalize. Optimum is to have a low bias, low variance model. Cross-validation can help to avoid overfitting.
Intuition fails in high dimensions
As the number of dimensions grow it becomes exponentially harder to generalize.
Theoretical guarantees are not what they seem
ML is induction. So unlike in deduction, there is no guarantee the conclusion is correct. Just because a learner has theoretical justification and works doesn't mean the justification is the reason for it to work.
Feature engineering is the key
Learning is easy when there are many independent features that correlate well with the class. So selecting correct features is important. It requires domain knowledge for this.
More data beats a cleverer algorithm
Even when the feature set is finalized, if the model is not working, there are 2 possibilities to try. 1)add more data. 2)design a better algorithm. Doing the former is easier.
Learn many models, not just one
Instead of selecting the best model, combining many variation often works better. And this is different for Bayesian model averaging.
Simplicity does not imply accuracy
There is no connection between number of parameters in the model and its tendency to overfit.
Representable does not imply learnable
More important question is can it be learnable.
Correlation does not imply causation
ML is applied to observational data. Though some learning algorithms can potentially extract causal information for observational data, their applicability is rather restricted. Treating the correlation as a sign of potential connection is the way to go.